New Year, New You, New Heights. 🥂🍾 Kick Off 2024 with 70% OFF!
I WANT IT! 🤙Operation Rescue is underway: 70% OFF on 12Min Premium!
New Year, New You, New Heights. 🥂🍾 Kick Off 2024 with 70% OFF!
This microbook is a summary/original review based on the book: Designing Machine Learning Systems: An iterative process for production-Ready applications
Available for: Read online, read in our mobile apps for iPhone/Android and send in PDF/EPUB/MOBI to Amazon Kindle.
ISBN: 978-1-098-10796-3
Publisher: O'Reilly Media
Have you ever had that frustrating feeling of building a Machine Learning model that looks flawless in your tests, only to watch it fall apart the moment it reaches the end user? That happens because there's a massive gap between the controlled environment of a lab or an online course and the raw reality of production. In academia, the data is clean and frozen in time, and the only goal is to beat a precision benchmark. In real life, data is messy, changes every second, and what actually matters is whether the system is reliable, scalable, and affordable to maintain.
This microbook, based on Chip Huyen's extensive experience at Stanford and in the industry, will show you that building a successful ML system is only about 10% model code and 90% infrastructure and data. We'll dive into an iterative process that will transform the way you see technology, moving from fragile prototypes to robust applications that solve real business problems.
ML development isn't a straight line — it's a constant cycle of learning and improvement where every piece of the system needs to work in harmony with every other piece. The payoff here is clarity: you'll stop treating ML as an isolated trick and start managing projects as serious, high-impact engineering. Along the way, you'll learn how to align what the algorithm does with what the business actually needs to grow or profit. It's not just about complex algorithms — it's about building a machine that learns and adapts to the real world without breaking every time something changes.
Many companies make the mistake of thinking Machine Learning solves any problem automatically. The first thing you need to understand is that ML is not a magic solution. It works well only when the problem involves complex patterns that shift over time. If a simple "if this, then that" rule solves the case, use the simple rule. The cost of maintaining an ML system is far too high to use without need.
When you do decide to go down this path, you need to know that ML systems design is not linear. It works as an infinite loop. If you discover something new during monitoring, it forces you to go back and revisit your data or your training.
Another crucial point is metric alignment. It doesn't matter if your model has 99% accuracy if it takes ten seconds to respond and the user leaves before that. Latency and compute cost are real-world priorities. Uber, for example, created the Michelangelo platform to standardize this flow. They realized they needed infrastructure that would let them test and ship models quickly and safely.
To replicate that success, always start by defining what success means for the business before looking at code accuracy. If the goal is to increase customer retention, your ML metric should be directly tied to that. At your next project meeting, ask what the financial impact of a model error would be, and use that answer to guide your design.
The quality of your ML system is limited by the quality of the data feeding it. It doesn't matter if you're using the most cutting-edge architecture in the world — garbage in, garbage out. Data engineering is the foundation of everything.
You need to decide between processing data in batch or in real time (stream). Real-time processing has become the key differentiator for companies that need instant responses, such as recommendation systems or fraud detection. On top of that, the way you store data directly affects cost and speed. Formats like Parquet are great for fast reads and take up less space.
Another major challenge is labeling. Getting human-labeled data is expensive and slow. That's why programmatic labeling and weak supervision are gaining more and more ground. A fatal mistake many people make is data leakage — where information the model shouldn't have at prediction time ends up in training, creating a false sense of perfection.
To prevent this and ensure consistency, leading companies use Feature Stores. DoorDash, for instance, uses this approach to make sure the same data used in training is used at delivery time, avoiding wrong predictions on delivery estimates. You can replicate this by creating a centralized feature repository for your team. Today, check whether any "future" information exists in your training set. Test this cleanup for 24 hours and see how your model's performance behaves in a much more realistic way.
When it comes to building the model, the golden rule is: start simple. Don't try to use a deep neural network if logistic regression solves the basic problem. Simple models serve as an essential baseline so you know whether the extra complexity is actually worth the effort.
Once the model is ready, offline evaluation is just the beginning. You use metrics like F1-score and AUC-ROC to get an idea of its potential, but the real test happens at deployment time. You need to choose between batch prediction — cheaper and easier to manage — or online prediction, which delivers immediate results to the user. If you work with mobile devices or sensors, ML on the Edge is the way to go, requiring compression techniques like quantization so the model runs without draining all the battery or memory.
Netflix uses very smart deployment strategies to test new algorithms. They often use shadow deployment (shadowing), where the new model receives real data and makes predictions, but the output isn't shown to the user. This validates system behavior without any risk of ruining the customer experience. You can replicate this approach by running your new model in parallel with the old one and comparing results silently for a few days. On your next system update, try a canary release, rolling out the new version to just 5% of users first.
Monitoring, continuous learning, and ethics
Putting a model into production isn't the finish line — it's just the beginning of a new phase. The world changes, and your model will lose performance over time. This is what we call drift. User behavior changes, new products emerge, and what was true yesterday isn't true today.
That's why constant monitoring is vital. You need to log everything that happens to identify silent failures. The ideal path is toward continuous learning, where the system updates itself with new data automatically, rather than relying on manual retraining every six months.
But be careful: automated systems can amplify biases and injustices if the input data is tainted. Ethics and fairness in ML aren't just "theoretical talk" — they're real business and reputation risks. Data privacy should also be at the center of the project from day one. Google, for example, invests heavily in differential privacy to train models without exposing individual user data. This ensures public trust and system security.
Success in ML demands that you accept the imperfect nature of data and commit to constant evolution. Today, ask your customers or users whether they notice any bias in your system's responses. Try implementing a simple statistics monitor on your input data to detect sudden pattern shifts in the next 24 hours.
Designing Machine Learning systems is a challenge that blends rigorous engineering with a deep understanding of data and human behavior. The key takeaway here is that a model only has real value when it's in production, generating impact, and being closely monitored. The focus should shift from the isolated algorithm to the full cycle: from ethical and efficient data collection to the continuous updating of the system in the real world. Treat ML as an engineering discipline where robustness and adaptability matter more than technical complexity for its own sake.
To complement your perspective on how to structure large-scale technology systems, we recommend the microbook Designing Data-Intensive Applications, by Martin Kleppmann. It dives deep into the data infrastructure that powers modern applications and will help you better understand how to scale your ML system with safety and performance. Check it out on 12min!
By signing up, you will get a free 7-day Trial to enjoy everything that 12min has to offer.
Total downloads
on Apple Store and Google Play
of 12min users improve their reading habits
Grow exponentially with the access to powerful insights from over 2,500 nonfiction microbooks.
Start enjoying 12min's extensive library
Don't worry, we'll send you a reminder that your free trial expires soon
Free Trial ends here
Get 7-day unlimited access. With 12min, start learning today and invest in yourself for just USD $4.14 per month. Cancel before the trial ends and you won't be charged.
Start your free trial



Now you can! Start a free trial and gain access to the knowledge of the biggest non-fiction bestsellers.